Introduction
It is known that the US has a bi-polar political landscape, with Democrats on one side and Republicans on the other. Our project aims to verify and back up this assumption with quotebank data by visualizing the political landscape in a network model.
With the rich political data in the dataset (we found that 13% of the quotations are uttered by politicians. And 48% of the political quotations are from the US),we decide to analyze the network between politicians through their quotation and tell a datastory about the ecosystem of the political world focusing on the United States.
This data story is based to this network using the bi-directional frequency, sentiment and topics when US politicians mention other politicians (self-mentions, US or world-wide).
The network connections will be analyzed in-depth to reveal the structure of central nodes and communities/hubs. It will also be extended with Natural Language Processing technique as sentiment analysis and machine learning technique Latent Dirichlet Allocation topic clustering (unsupervised) to reveal more information (emotion and recurrent topics) in the mentions.
US-Network
To capture interactions between politicians, we built a directed network graph where each node is a politician and each edge represent mentions of one politician by another one. The weight of each edge represent the number of occurencies of it. We applied the Kernighan-Lin algorithm to split the graph into 2 communities. Each node has two attributes: the party the politician belongs to (Democratic, Republican or directly affiliated parties) and the community it has been assigned to. The goal of this part of our work was to check wether US politicians interact more within their own party or not.
American politician network only based in mentioning
This plot goal is to get a feeling about the structure of our network. Only the 20 most central nodes of each community and main edges between them are shown. The size of the node represent the PageRank centrality. To have an easier interpretable plot, we rendered the graph undirected and added up the weights of the edges between each couple of node. The width of the edges then represent the weights of the edges. Feel free to explore!
Who is the most popular and/or gossipy ?
From the main nodes PageRank centrality and out degree distribution we can watch some interesting differencies. As it can be expected, Trump is most talking and most mentioned politician, followed by Obama who is mentioned very often but does not mentions other politicians a lot for example. In the same sense, George W. Bush is mentioned a lot but does not even appears on the 20 most talkative politicians. The third graph, the node total degree distribution, shows that the distribution follows a power law distribution, as expected for this type of big social network graph.
Does the democratic and Republican makes really "parties" ?
We computed the correlation between the party and network community belongings of the politicians, and the results are statistically significant (Spearman's correlation coefficient = 0.03 with a p-value of 0.012) in a 95% two-sided confidence interval. We can therefore say that there is a statiscally significant correlation between real Republican/Democratic politicians and the community bisection of our network, meaning US Republican/Democratic politicians have the tendency to more often mention their peers rather than opposite party politicians in the medias.
This plot shows the main nodes of each community in a bipartite layout corresponding to bisection community, as for before, fill color represent the party belonging, surronding color represent bisection community belongings and edges width represent edge weights.
Conclusion ? are the communities part of the real party
How much can you be narcissist ?
We also decided that it would be interesting to vizualize self-mentionning frequency, so we computed the frequency of self-mentionning as the fraction of self-mentions in the total mentions for each politicians. In this plot, the 40 politicians with highest self-mentionning frequencies are shown. Color represents party belongings and size of nodes represent the number of frequency. Here are revealed the most narcissist american politicians.
Sentiment & Topics : plot (drowpdown list and color arrows): What do they like to talk about ?
Bias : histogram : Question: how biases could change results ?
Make World Great Again
By the way, how do you think the US politician talk about foreign politician ?
We decided to visualize it with a global map of the quotes from US politicans to the different countries of the world. With the plot below, you can select from which party do you want to visualize the mentions and the timeframe (2015-2020). You can zoom and which country you are interested in, the width of the lines represent the frequency according to the year and the colour of the country is the mean sentiment from all/party politician. Have fun and look how it envolves and differs by years/parties!
What is most remarkable here ? First, in general the most mentioned country prize is for ... Russia (not very surprising... right?). Except for one year and one party ... Can you find it ? Hint: When were there a presidential crisis ? Hint 2 ( but you have certainly already find it ! ) : It was in South America.
Secondly, it seems that republicans and democrats do not have really the same thinkings. Carefull, the "All" parties visualization give an average of the sentiment, it doesn't represent what all the parties could think, a neutral sentiment could be caused by the cancellation from different parties (reason why democratic and republicans can be also visualized separately). Do you like politics ? are you good on history ? Feel free to have explore it and deduce which could be the reasons of the worst/best sentiment.
Conclusion
The polarization of the United States politics is already backed by multiple studies, but many of them is done indirectly via survey on ideology or public behavior change (NW et al., 2014; Wilson et al., 2020). The network on the individual politician level is a direct reflection of the political structure with quantitative metrics like connectivity or betweenness centrality. On a global level, some major events also occur during the time span of quotebank (2015 - 2020) like Brexit, the US-China trade war, COVID pandemic, etc., which could be interesting to see if it is reflected from the global political network.